Extracting Formulaic and Free Text Clinical Research Articles Metadata using Conditional Random Fields
نویسندگان
چکیده
We explore the use of conditional random fields (CRFs) to automatically extract important metadata from clinical research articles. These metadata fields include formulaic metadata about the authors, extracted from the title page, as well as free text fields concerning the study’s critical parameters, such as longitudinal variables and medical intervention methods, extracted from the body text of the article. Extracting such information can help both readers conduct deep semantic search of articles and policy makers and sociologists track macro level trends in research. Preliminary results show an acceptable level of performance for formulaic metadata and a high precision for those found in the free text.
منابع مشابه
Segmentation of Document Using Discriminative Context-free Grammar Inference and Alignment Similarities
Text Documents present a great challenge to the field of document recognition. Automatic segmentation and layout analysis of documents is used for interpretation and machine translation of documents. Document such as research papers, address book, news etc. is available in the form of un-structured format. Extracting relevant Knowledge from this document has been recognized as promising task. E...
متن کاملAnnotated Bibliographical Reference Corpora in Digital Humanities
In this paper, we present new bibliographical reference corpora in digital humanities (DH) that have been developed under a research project, Robust and Language Independent Machine Learning Approaches for Automatic Annotation of Bibliographical References in DH Books supported by Google Digital Humanities Research Awards. The main target is the bibliographical references in the articles of Rev...
متن کاملExtracting Time Expressions from Clinical Text
Temporal information extraction is important to understanding text in clinical documents. Temporal expression extraction provides explicit grounding of events in a narrative. In this work we provide a direct comparison of various ways of extracting temporal expressions, using similar features as much as possible to explore the advantages of the methods themselves. We evaluate these systems on b...
متن کاملCitations in the Digital Library of Classics: Extracting Canonical References by Using Conditional Random Fields
Scholars of Classics cite ancient texts by using abridged citations called canonical references. In the scholarly digital library, canonical references create a complex textile of links between ancient and modern sources reflecting the deep hypertextual nature of texts in this field. This paper aims to demonstrate the suitability of Conditional Random Fields (CRF) for extracting this particular...
متن کاملExtracting Data Records from Unstructured Biomedical Full Text
In this paper, we address the problem of extracting data records and their attributes from unstructured biomedical full text. There has been little effort reported on this in the research community. We argue that semantics is important for record extraction or finer-grained language processing tasks. We derive a data record template including semantic language models from unstructured text and ...
متن کامل